# Imbalanced Regression Dataset Repository

This repository provides access to datasets tailored for benchmarking regression models under imbalanced target distributions.

The datasets include metadata such as feature types, presence of missing values, and relevance of extreme values.

## Available Datasets

<div style='font-size:9px; line-height:1.2;'>
<table style='border-collapse:collapse;'>
<thead><tr><th style='border:1px solid #ccc; padding:2px 4px;'>Dataset</th><th style='border:1px solid #ccc; padding:2px 4px;'>Description</th><th style='border:1px solid #ccc; padding:2px 4px;'>Features</th><th style='border:1px solid #ccc; padding:2px 4px;'>Nominal</th><th style='border:1px solid #ccc; padding:2px 4px;'>Numeric</th><th style='border:1px solid #ccc; padding:2px 4px;'>Instances</th><th style='border:1px solid #ccc; padding:2px 4px;'>Missing</th><th style='border:1px solid #ccc; padding:2px 4px;'>Type of Extreme</th><th style='border:1px solid #ccc; padding:2px 4px;'>Relevance Threshold</th><th style='border:1px solid #ccc; padding:2px 4px;'># Rare</th><th style='border:1px solid #ccc; padding:2px 4px;'>% Rare</th><th style='border:1px solid #ccc; padding:2px 4px;'>Target Variable</th><th style='border:1px solid #ccc; padding:2px 4px;'>Source</th></tr></thead>
<tbody>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>a1</td><td style='border:1px solid #ccc; padding:2px 4px;'>The data points are taken on an annual basis from various streams / rivers in Europe, compiling features aimed at predicting the concentrations of seven algae species.</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>198</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>a1</td><td style='border:1px solid #ccc; padding:2px 4px;'>[1]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>a2</td><td style='border:1px solid #ccc; padding:2px 4px;'>The data points are taken on an annual basis from various streams / rivers in Europe, compiling features aimed at predicting the concentrations of seven algae species.</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>198</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>a2</td><td style='border:1px solid #ccc; padding:2px 4px;'>[1]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>a3</td><td style='border:1px solid #ccc; padding:2px 4px;'>The data points are taken on an annual basis from various streams / rivers in Europe, compiling features aimed at predicting the concentrations of seven algae species.</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>198</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>a3</td><td style='border:1px solid #ccc; padding:2px 4px;'>[1]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>a4</td><td style='border:1px solid #ccc; padding:2px 4px;'>The data points are taken on an annual basis from various streams / rivers in Europe, compiling features aimed at predicting the concentrations of seven algae species.</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>198</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>a4</td><td style='border:1px solid #ccc; padding:2px 4px;'>[1]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>a6</td><td style='border:1px solid #ccc; padding:2px 4px;'>The data points are taken on an annual basis from variousstreams / rivers in Europe, compiling features aimed at predicting the concentrations of seven algae species.</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>198</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>a6</td><td style='border:1px solid #ccc; padding:2px 4px;'>[1]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>a7</td><td style='border:1px solid #ccc; padding:2px 4px;'>The data points are taken on an annual basis from various streams / rivers in Europe, compiling features aimed at predicting the concentrations of seven algae species.</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>198</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>14</td><td style='border:1px solid #ccc; padding:2px 4px;'>7.07%</td><td style='border:1px solid #ccc; padding:2px 4px;'>a7</td><td style='border:1px solid #ccc; padding:2px 4px;'>[1]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>abalone</td><td style='border:1px solid #ccc; padding:2px 4px;'>Predict the age of abalone from physical measurements.</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>1</td><td style='border:1px solid #ccc; padding:2px 4px;'>7</td><td style='border:1px solid #ccc; padding:2px 4px;'>4177</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>1033</td><td style='border:1px solid #ccc; padding:2px 4px;'>24.73%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Rings</td><td style='border:1px solid #ccc; padding:2px 4px;'>[2]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>acceleration</td><td style='border:1px solid #ccc; padding:2px 4px;'>Dataset with acceleration statistics.</td><td style='border:1px solid #ccc; padding:2px 4px;'>14</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>1732</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>158</td><td style='border:1px solid #ccc; padding:2px 4px;'>9.12%</td><td style='border:1px solid #ccc; padding:2px 4px;'>acceleration</td><td style='border:1px solid #ccc; padding:2px 4px;'>[3]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>ailerons</td><td style='border:1px solid #ccc; padding:2px 4px;'>The attributes describe the status of the aeroplane, while the goal is to predict the control action on the ailerons of the aircraft.</td><td style='border:1px solid #ccc; padding:2px 4px;'>40</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>40</td><td style='border:1px solid #ccc; padding:2px 4px;'>13515</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>1622</td><td style='border:1px solid #ccc; padding:2px 4px;'>11.80%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Goal</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>airfoil</td><td style='border:1px solid #ccc; padding:2px 4px;'>NASA data set, obtained from a series of aerodynamic and acoustic tests of two and three-dimensional airfoil blade sections conducted in an anechoic wind tunnel.</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>1503</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>80</td><td style='border:1px solid #ccc; padding:2px 4px;'>5.32%</td><td style='border:1px solid #ccc; padding:2px 4px;'>scaled-sound-pressure</td><td style='border:1px solid #ccc; padding:2px 4px;'>[5]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>anacalt</td><td style='border:1px solid #ccc; padding:2px 4px;'>The data contains information about the decisions taken by a supreme court.</td><td style='border:1px solid #ccc; padding:2px 4px;'>7</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>7</td><td style='border:1px solid #ccc; padding:2px 4px;'>4052</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Log_exposure</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>appliances_energy</td><td style='border:1px solid #ccc; padding:2px 4px;'>Experimental data used to create regression models of appliances energy use in a low energy building.</td><td style='border:1px solid #ccc; padding:2px 4px;'>27</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>27</td><td style='border:1px solid #ccc; padding:2px 4px;'>19735</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>8212</td><td style='border:1px solid #ccc; padding:2px 4px;'>41.61%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Appliances</td><td style='border:1px solid #ccc; padding:2px 4px;'>[6]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>autoprices</td><td style='border:1px solid #ccc; padding:2px 4px;'>Dataset with feature leading to the prediction of its price.</td><td style='border:1px solid #ccc; padding:2px 4px;'>16</td><td style='border:1px solid #ccc; padding:2px 4px;'>1</td><td style='border:1px solid #ccc; padding:2px 4px;'>15</td><td style='border:1px solid #ccc; padding:2px 4px;'>159</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>class</td><td style='border:1px solid #ccc; padding:2px 4px;'>[7]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>availablePower</td><td style='border:1px solid #ccc; padding:2px 4px;'>Dataset with power statistics.</td><td style='border:1px solid #ccc; padding:2px 4px;'>15</td><td style='border:1px solid #ccc; padding:2px 4px;'>7</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>1802</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>305</td><td style='border:1px solid #ccc; padding:2px 4px;'>16.93%</td><td style='border:1px solid #ccc; padding:2px 4px;'>available.power</td><td style='border:1px solid #ccc; padding:2px 4px;'>[8]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>bank8FM</td><td style='border:1px solid #ccc; padding:2px 4px;'>Part of a family of datasets synthetically generated from a simulation of how bank-customers choose their banks.</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>4499</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>rej</td><td style='border:1px solid #ccc; padding:2px 4px;'>[9]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>baseball</td><td style='border:1px solid #ccc; padding:2px 4px;'>This dataset contains the 1992 salaries of the set of Major League Baseball players who played at least one game in both the 1991 and 1992 seasons, excluding pitchers.</td><td style='border:1px solid #ccc; padding:2px 4px;'>16</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>16</td><td style='border:1px solid #ccc; padding:2px 4px;'>337</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Salary</td><td style='border:1px solid #ccc; padding:2px 4px;'>[10]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>californiaHousing</td><td style='border:1px solid #ccc; padding:2px 4px;'>This data set contains information about all the block groups in California from the 1990 Census.</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>20640</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Low</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>1802</td><td style='border:1px solid #ccc; padding:2px 4px;'>8.73%</td><td style='border:1px solid #ccc; padding:2px 4px;'>MedianHouseValue</td><td style='border:1px solid #ccc; padding:2px 4px;'>[11]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>cocomo</td><td style='border:1px solid #ccc; padding:2px 4px;'>Software Engineering Repository data set made publicly available in order to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering.</td><td style='border:1px solid #ccc; padding:2px 4px;'>16</td><td style='border:1px solid #ccc; padding:2px 4px;'>1</td><td style='border:1px solid #ccc; padding:2px 4px;'>1</td><td style='border:1px solid #ccc; padding:2px 4px;'>60</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Low</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>14</td><td style='border:1px solid #ccc; padding:2px 4px;'>23.33%</td><td style='border:1px solid #ccc; padding:2px 4px;'>ACT_EFFORT</td><td style='border:1px solid #ccc; padding:2px 4px;'>[12]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>concrete</td><td style='border:1px solid #ccc; padding:2px 4px;'>Concrete Compressive Strength data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>1030</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Low</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>ConcreteCompressiveStrength</td><td style='border:1px solid #ccc; padding:2px 4px;'>[13]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>cpuActiv</td><td style='border:1px solid #ccc; padding:2px 4px;'>Computer activity data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>21</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>21</td><td style='border:1px solid #ccc; padding:2px 4px;'>8192</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Low</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>371</td><td style='border:1px solid #ccc; padding:2px 4px;'>4.53%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Usr</td><td style='border:1px solid #ccc; padding:2px 4px;'>[9]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>cpuSm</td><td style='border:1px solid #ccc; padding:2px 4px;'>The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user university department. The final dataset is taken from both occasions with equal numbers of observations coming from each collection epoch.</td><td style='border:1px solid #ccc; padding:2px 4px;'>12</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>12</td><td style='border:1px solid #ccc; padding:2px 4px;'>8192</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Low</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>371</td><td style='border:1px solid #ccc; padding:2px 4px;'>4.53%</td><td style='border:1px solid #ccc; padding:2px 4px;'>usr</td><td style='border:1px solid #ccc; padding:2px 4px;'>[9]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>debutanizer</td><td style='border:1px solid #ccc; padding:2px 4px;'>This dataset aims to predict the butane concentration on a Debutanizer column.</td><td style='border:1px solid #ccc; padding:2px 4px;'>7</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>7</td><td style='border:1px solid #ccc; padding:2px 4px;'>2394</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>212</td><td style='border:1px solid #ccc; padding:2px 4px;'>8.86%</td><td style='border:1px solid #ccc; padding:2px 4px;'>y</td><td style='border:1px solid #ccc; padding:2px 4px;'>[14]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>deltaAirlerons</td><td style='border:1px solid #ccc; padding:2px 4px;'>This data set is also obtained from the task of controlling the ailerons of a F16 aircraft.</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>7129</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>1206</td><td style='border:1px solid #ccc; padding:2px 4px;'>16.92%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Sa</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>deltaElevators</td><td style='border:1px solid #ccc; padding:2px 4px;'>This data set is also obtained from the task of controlling the elevators of a F16 aircraft.</td><td style='border:1px solid #ccc; padding:2px 4px;'>60</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>6</td><td style='border:1px solid #ccc; padding:2px 4px;'>9517</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>4785</td><td style='border:1px solid #ccc; padding:2px 4px;'>50.28%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Se</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>diabetes</td><td style='border:1px solid #ccc; padding:2px 4px;'>This data set concerns the study of the factors affecting patterns of insulin-dependent diabetes mellitus in children.</td><td style='border:1px solid #ccc; padding:2px 4px;'>2</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>2</td><td style='border:1px solid #ccc; padding:2px 4px;'>43</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>6</td><td style='border:1px solid #ccc; padding:2px 4px;'>13.95%</td><td style='border:1px solid #ccc; padding:2px 4px;'>C_peptide</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>ele-1</td><td style='border:1px solid #ccc; padding:2px 4px;'>Electrical Length data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>2</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>2</td><td style='border:1px solid #ccc; padding:2px 4px;'>495</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>21</td><td style='border:1px solid #ccc; padding:2px 4px;'>4.24%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Length</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>ele-2</td><td style='border:1px solid #ccc; padding:2px 4px;'>Electrical-Maintenance data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>4</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>4</td><td style='border:1px solid #ccc; padding:2px 4px;'>1056</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Y</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>elevators</td><td style='border:1px solid #ccc; padding:2px 4px;'>The attributes describe the status of the aeroplane, while the goal is to predict the control action on the ailerons of the aircraft.</td><td style='border:1px solid #ccc; padding:2px 4px;'>18</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>18</td><td style='border:1px solid #ccc; padding:2px 4px;'>16599</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>4390</td><td style='border:1px solid #ccc; padding:2px 4px;'>26.45%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Goal</td><td style='border:1px solid #ccc; padding:2px 4px;'>[15]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>forestFires</td><td style='border:1px solid #ccc; padding:2px 4px;'>Forest Fires data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>12</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>12</td><td style='border:1px solid #ccc; padding:2px 4px;'>517</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>15</td><td style='border:1px solid #ccc; padding:2px 4px;'>2.90%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Area</td><td style='border:1px solid #ccc; padding:2px 4px;'>[16]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>friedman</td><td style='border:1px solid #ccc; padding:2px 4px;'>Friedman Benchmark Function data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>1200</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Output</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>fuelConsumption</td><td style='border:1px solid #ccc; padding:2px 4px;'>The data contains information about car’s emissions and fuel consumption.</td><td style='border:1px solid #ccc; padding:2px 4px;'>37</td><td style='border:1px solid #ccc; padding:2px 4px;'>12</td><td style='border:1px solid #ccc; padding:2px 4px;'>25</td><td style='border:1px solid #ccc; padding:2px 4px;'>1764</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>167</td><td style='border:1px solid #ccc; padding:2px 4px;'>9.47%</td><td style='border:1px solid #ccc; padding:2px 4px;'>fuel.counsumption.country</td><td style='border:1px solid #ccc; padding:2px 4px;'>[17]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>geographical_origin_music</td><td style='border:1px solid #ccc; padding:2px 4px;'>Instances in this dataset contain audio features extracted from 1059 wave files. The task associated with the data is to predict the geographical origin of music.</td><td style='border:1px solid #ccc; padding:2px 4px;'>117</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>117</td><td style='border:1px solid #ccc; padding:2px 4px;'>1059</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>104</td><td style='border:1px solid #ccc; padding:2px 4px;'>9.82%</td><td style='border:1px solid #ccc; padding:2px 4px;'>V100</td><td style='border:1px solid #ccc; padding:2px 4px;'>[18]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>heat</td><td style='border:1px solid #ccc; padding:2px 4px;'>Dataset with heating statistics.</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>7400</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>833</td><td style='border:1px solid #ccc; padding:2px 4px;'>11.26%</td><td style='border:1px solid #ccc; padding:2px 4px;'>heat</td><td style='border:1px solid #ccc; padding:2px 4px;'>[8]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>house16H</td><td style='border:1px solid #ccc; padding:2px 4px;'>This database was designed on the basis of data provided by US Census Bureau.</td><td style='border:1px solid #ccc; padding:2px 4px;'>16</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>16</td><td style='border:1px solid #ccc; padding:2px 4px;'>22784</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>6098</td><td style='border:1px solid #ccc; padding:2px 4px;'>26.76%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Price</td><td style='border:1px solid #ccc; padding:2px 4px;'>[9]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>housing</td><td style='border:1px solid #ccc; padding:2px 4px;'>The Ames Housing Dataset is a well-known dataset in the field of machine learning and data analysis. It contains various features and attributes of residential homes in Ames, Iowa, USA.</td><td style='border:1px solid #ccc; padding:2px 4px;'>79</td><td style='border:1px solid #ccc; padding:2px 4px;'>43</td><td style='border:1px solid #ccc; padding:2px 4px;'>36</td><td style='border:1px solid #ccc; padding:2px 4px;'>1460</td><td style='border:1px solid #ccc; padding:2px 4px;'>Yes (7829)</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>179</td><td style='border:1px solid #ccc; padding:2px 4px;'>12.26%</td><td style='border:1px solid #ccc; padding:2px 4px;'>SalePrice</td><td style='border:1px solid #ccc; padding:2px 4px;'>[19]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>housingBoston</td><td style='border:1px solid #ccc; padding:2px 4px;'>This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass.</td><td style='border:1px solid #ccc; padding:2px 4px;'>13</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>13</td><td style='border:1px solid #ccc; padding:2px 4px;'>506</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>105</td><td style='border:1px solid #ccc; padding:2px 4px;'>20.75%</td><td style='border:1px solid #ccc; padding:2px 4px;'>HousValue</td><td style='border:1px solid #ccc; padding:2px 4px;'>[20]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>kdd_coil_1</td><td style='border:1px solid #ccc; padding:2px 4px;'>This data set is from the 1999 Computational Intelligence and Learning (COIL) competition. The data contains measurements of river chemical concentrations and algae densities.</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>316</td><td style='border:1px solid #ccc; padding:2px 4px;'>Yes (56)</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>algae_1</td><td style='border:1px solid #ccc; padding:2px 4px;'>[21]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>kinematics8nm</td><td style='border:1px solid #ccc; padding:2px 4px;'>This is data set is concerned with the forward kinematics of an 8 link robot arm.</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>8192</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>y</td><td style='border:1px solid #ccc; padding:2px 4px;'>[9]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>laser</td><td style='border:1px solid #ccc; padding:2px 4px;'>Laser generated data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>4</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>4</td><td style='border:1px solid #ccc; padding:2px 4px;'>993</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Output</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>lungcancer-shedden</td><td style='border:1px solid #ccc; padding:2px 4px;'>Prediction in Lung Adenocarcinoma</td><td style='border:1px solid #ccc; padding:2px 4px;'>23</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>20</td><td style='border:1px solid #ccc; padding:2px 4px;'>442</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>12</td><td style='border:1px solid #ccc; padding:2px 4px;'>2.71%</td><td style='border:1px solid #ccc; padding:2px 4px;'>OS_years</td><td style='border:1px solid #ccc; padding:2px 4px;'>[22]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>machineCPU</td><td style='border:1px solid #ccc; padding:2px 4px;'>Machine CPU Performance data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>6</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>6</td><td style='border:1px solid #ccc; padding:2px 4px;'>209</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>46</td><td style='border:1px solid #ccc; padding:2px 4px;'>22.01%</td><td style='border:1px solid #ccc; padding:2px 4px;'>PRP</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>maxTorque</td><td style='border:1px solid #ccc; padding:2px 4px;'>Dataset with torque statistics.</td><td style='border:1px solid #ccc; padding:2px 4px;'>32</td><td style='border:1px solid #ccc; padding:2px 4px;'>13</td><td style='border:1px solid #ccc; padding:2px 4px;'>19</td><td style='border:1px solid #ccc; padding:2px 4px;'>1802</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>235</td><td style='border:1px solid #ccc; padding:2px 4px;'>13.04%</td><td style='border:1px solid #ccc; padding:2px 4px;'>maximal.torque</td><td style='border:1px solid #ccc; padding:2px 4px;'>[23]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>meta</td><td style='border:1px solid #ccc; padding:2px 4px;'>Meta-Data was used in order to give advice about which classification method is appropriate for a particular dataset.</td><td style='border:1px solid #ccc; padding:2px 4px;'>21</td><td style='border:1px solid #ccc; padding:2px 4px;'>2</td><td style='border:1px solid #ccc; padding:2px 4px;'>19</td><td style='border:1px solid #ccc; padding:2px 4px;'>528</td><td style='border:1px solid #ccc; padding:2px 4px;'>Yes (504)</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>165</td><td style='border:1px solid #ccc; padding:2px 4px;'>31.25%</td><td style='border:1px solid #ccc; padding:2px 4px;'>class</td><td style='border:1px solid #ccc; padding:2px 4px;'>[24]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>mortgage</td><td style='border:1px solid #ccc; padding:2px 4px;'>Mortgage data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>15</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>15</td><td style='border:1px solid #ccc; padding:2px 4px;'>1049</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Low</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>133</td><td style='border:1px solid #ccc; padding:2px 4px;'>12.68%</td><td style='border:1px solid #ccc; padding:2px 4px;'>30Y-CMortgageRate</td><td style='border:1px solid #ccc; padding:2px 4px;'>[25]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>pdgfr</td><td style='border:1px solid #ccc; padding:2px 4px;'>This is one of 41 drug design datasets.</td><td style='border:1px solid #ccc; padding:2px 4px;'>320</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>320</td><td style='border:1px solid #ccc; padding:2px 4px;'>79</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>15</td><td style='border:1px solid #ccc; padding:2px 4px;'>18.99%</td><td style='border:1px solid #ccc; padding:2px 4px;'>oz322</td><td style='border:1px solid #ccc; padding:2px 4px;'>[26]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>pollen</td><td style='border:1px solid #ccc; padding:2px 4px;'>This dataset is synthetic. It was generated by David Coleman at RCA Laboratories in Princeton, N.J.</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>3848</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>242</td><td style='border:1px solid #ccc; padding:2px 4px;'>6.29%</td><td style='border:1px solid #ccc; padding:2px 4px;'>DENSITY</td><td style='border:1px solid #ccc; padding:2px 4px;'>[27]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>puma32NH</td><td style='border:1px solid #ccc; padding:2px 4px;'>This is a family of datasets synthetically generated from a realistic simulation of the dynamics of a Unimation Puma 560 robot arm.</td><td style='border:1px solid #ccc; padding:2px 4px;'>32</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>32</td><td style='border:1px solid #ccc; padding:2px 4px;'>8192</td><td style='border:1px solid #ccc; padding:2px 4px;'>Yes (33)</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>thetadd6</td><td style='border:1px solid #ccc; padding:2px 4px;'>[9]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>puma8NH</td><td style='border:1px solid #ccc; padding:2px 4px;'>This is a family of datasets synthetically generated from a realistic simulation of the dynamics of a Unimation Puma 560 robot arm.</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>8</td><td style='border:1px solid #ccc; padding:2px 4px;'>8192</td><td style='border:1px solid #ccc; padding:2px 4px;'>Yes (9)</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>thetadd3</td><td style='border:1px solid #ccc; padding:2px 4px;'>[9]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>quake</td><td style='border:1px solid #ccc; padding:2px 4px;'>Quake data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>3</td><td style='border:1px solid #ccc; padding:2px 4px;'>2178</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Richter</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>sensory</td><td style='border:1px solid #ccc; padding:2px 4px;'>Data for the sensory evaluation experiment in Brien, C.J. and Payne, R.W. (1996) Tiers, structure formulae and the analysis of complicated experiments.</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>576</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>69</td><td style='border:1px solid #ccc; padding:2px 4px;'>11.98%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Score</td><td style='border:1px solid #ccc; padding:2px 4px;'>[28]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>servo</td><td style='border:1px solid #ccc; padding:2px 4px;'>This is an interesting collection of data provided by Karl Ulrich. It covers an extremely non-linear phenomenon - predicting the rise time of a servomechanism in terms of two (continuous) gain settings and two (discrete) choices of mechanical linkages.</td><td style='border:1px solid #ccc; padding:2px 4px;'>4</td><td style='border:1px solid #ccc; padding:2px 4px;'>4</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>167</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>59</td><td style='border:1px solid #ccc; padding:2px 4px;'>35.33%</td><td style='border:1px solid #ccc; padding:2px 4px;'>class</td><td style='border:1px solid #ccc; padding:2px 4px;'>[15]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>space_ga</td><td style='border:1px solid #ccc; padding:2px 4px;'>The dataset contains 3,107 observations on U.S. county votes cast in the 1980 presidential election.</td><td style='border:1px solid #ccc; padding:2px 4px;'>6</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>6</td><td style='border:1px solid #ccc; padding:2px 4px;'>3107</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>182</td><td style='border:1px solid #ccc; padding:2px 4px;'>5.86%</td><td style='border:1px solid #ccc; padding:2px 4px;'>ln_votes_pop</td><td style='border:1px solid #ccc; padding:2px 4px;'>[29]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>stock</td><td style='border:1px solid #ccc; padding:2px 4px;'>Stock Prices data set</td><td style='border:1px solid #ccc; padding:2px 4px;'>9</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>9</td><td style='border:1px solid #ccc; padding:2px 4px;'>950</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Company10</td><td style='border:1px solid #ccc; padding:2px 4px;'>[30]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>strikes</td><td style='border:1px solid #ccc; padding:2px 4px;'>The data consist of annual observations on the level of strike volume (days lost due to industrial disputes per 1000 wage salary earners), and their covariates in 18 OECD countries from 1951-1985.</td><td style='border:1px solid #ccc; padding:2px 4px;'>6</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>6</td><td style='border:1px solid #ccc; padding:2px 4px;'>625</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>15</td><td style='border:1px solid #ccc; padding:2px 4px;'>2.40%</td><td style='border:1px solid #ccc; padding:2px 4px;'>strike_volume</td><td style='border:1px solid #ccc; padding:2px 4px;'>[31]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>sulfer_1</td><td style='border:1px solid #ccc; padding:2px 4px;'>The sulfur recovery unit (SRU) removes environmental pollutants from acid gas streams before they are released into the atmosphere. Furthermore, elemental sulfur is recovered as a valuable by-product.</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>10081</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>1117</td><td style='border:1px solid #ccc; padding:2px 4px;'>11.08%</td><td style='border:1px solid #ccc; padding:2px 4px;'>y1</td><td style='border:1px solid #ccc; padding:2px 4px;'>[32]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>sulfer_2</td><td style='border:1px solid #ccc; padding:2px 4px;'>The sulfur recovery unit (SRU) removes environmental pollutants from acid gas streams before they are released into the atmosphere. Furthermore, 0.8elemental sulfur is recovered as a valuable by-product.</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>5</td><td style='border:1px solid #ccc; padding:2px 4px;'>10081</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>1444</td><td style='border:1px solid #ccc; padding:2px 4px;'>14.32%</td><td style='border:1px solid #ccc; padding:2px 4px;'>y2</td><td style='border:1px solid #ccc; padding:2px 4px;'>[32]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>supercondutivity</td><td style='border:1px solid #ccc; padding:2px 4px;'>Two files contain data on 21263 superconductors and their relevant features.</td><td style='border:1px solid #ccc; padding:2px 4px;'>81</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>81</td><td style='border:1px solid #ccc; padding:2px 4px;'>21263</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>critical_temp</td><td style='border:1px solid #ccc; padding:2px 4px;'>[33]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>treasury</td><td style='border:1px solid #ccc; padding:2px 4px;'>This file contains the Economic data information of USA from 01/04/1980 to 02/04/2000 on a weekly basis.</td><td style='border:1px solid #ccc; padding:2px 4px;'>15</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>15</td><td style='border:1px solid #ccc; padding:2px 4px;'>1049</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Low</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>137</td><td style='border:1px solid #ccc; padding:2px 4px;'>13.06%</td><td style='border:1px solid #ccc; padding:2px 4px;'>1MonthCDRate</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>triazines</td><td style='border:1px solid #ccc; padding:2px 4px;'>A triazine dataset. The goal is to predict the inhibition of dihydrofolate reductase by triazines.</td><td style='border:1px solid #ccc; padding:2px 4px;'>60</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>60</td><td style='border:1px solid #ccc; padding:2px 4px;'>186</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>23</td><td style='border:1px solid #ccc; padding:2px 4px;'>12.37%</td><td style='border:1px solid #ccc; padding:2px 4px;'>activity</td><td style='border:1px solid #ccc; padding:2px 4px;'>[34]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>wankara</td><td style='border:1px solid #ccc; padding:2px 4px;'>This file contains the weather information of Ankara from 01/01/1994 to 28/05/1998.</td><td style='border:1px solid #ccc; padding:2px 4px;'>9</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>9</td><td style='border:1px solid #ccc; padding:2px 4px;'>1609</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Mean_temperature</td><td style='border:1px solid #ccc; padding:2px 4px;'>[4]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>wine-quality</td><td style='border:1px solid #ccc; padding:2px 4px;'>The two datasets are combined and related to red and white variants of the Portuguese "Vinho Verde" wine.</td><td style='border:1px solid #ccc; padding:2px 4px;'>12</td><td style='border:1px solid #ccc; padding:2px 4px;'>1</td><td style='border:1px solid #ccc; padding:2px 4px;'>11</td><td style='border:1px solid #ccc; padding:2px 4px;'>6497</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>High</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>4113</td><td style='border:1px solid #ccc; padding:2px 4px;'>63.31%</td><td style='border:1px solid #ccc; padding:2px 4px;'>quality</td><td style='border:1px solid #ccc; padding:2px 4px;'>[35]</td></tr>
<tr><td style='border:1px solid #ccc; padding:2px 4px;'>yachtHydrodynamics</td><td style='border:1px solid #ccc; padding:2px 4px;'>Delft data set, used to predict the hydodynamic performance of sailing yachts from dimensions and velocity.</td><td style='border:1px solid #ccc; padding:2px 4px;'>6</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>6</td><td style='border:1px solid #ccc; padding:2px 4px;'>308</td><td style='border:1px solid #ccc; padding:2px 4px;'>No</td><td style='border:1px solid #ccc; padding:2px 4px;'>Both</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.8</td><td style='border:1px solid #ccc; padding:2px 4px;'>0</td><td style='border:1px solid #ccc; padding:2px 4px;'>0.00%</td><td style='border:1px solid #ccc; padding:2px 4px;'>Residuary_Resistance</td><td style='border:1px solid #ccc; padding:2px 4px;'>[36]</td></tr>
</tbody></table></div>

## References

[1] Torgo, L. (2016). Data mining with R: Learning with case studies (2nd ed.). Chapman & Hall/CRC. http://ltorgo.github.io/DMwR2

[2] Nash, W., Sellers, T., Talbot, S., Cawthorn, A., & Ford, W. (1994). Abalone. UCI Machine Learning Repository. https://doi.org/10.24432/C55C7W

[3] Moniz, N., Ribeiro, R. P., & Margarido, M. (2023). accel: Acceleration dataset [Dataset in the IRon R package, version 0.1.4]. https://CRAN.R-project.org/package=IRon

[4] Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., & Herrera, F. (2011). KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17(2–3), 255–287.

[5] Brooks, T., Pope, D., & Marcolini, M. (1989). Airfoil self-noise [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5VW2C

[6] Candanedo, L. (2017). Appliances energy prediction [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5VC8G

[7] Schlimmer, J. (1985). Automobile [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5B01C

[8] Camacho, L., & Bação, F. (2024). WSMOTER: A novel approach for imbalanced regression. Applied Intelligence, 54, 1–11. https://doi.org/10.1007/s10489-024-05608-6

[9] Rasmussen, C. E., Neal, R. M., Hinton, G. E., et al. (1996). DELVE: Data for evaluating learning in valid experiments[Software and dataset repository]. University of Toronto. https://www.cs.toronto.edu/~delve/

[10] Journal of Statistics Education. (1992). Pay for play: Are baseball salaries based on performance? [Dataset]. Dataset available from the Journal of Statistics Education. https://jse.amstat.org/datasets/baseball.dat.txt

[11] Carnegie Mellon University. (2016). StatLib: A data and software archive [Online dataset repository]. https://lib.stat.cmu.edu

[12] OpenML contributors. (2025). OpenML dataset 1051 [Dataset]. OpenML: An open platform for machine learning. https://www.openml.org/d/1051

[13] Yeh, I.-C. (1998). Concrete compressive strength [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5PK67

[14] Fortuna, L., Graziani, S., Rizzo, A., & Xibilia, M. G. (2007). Soft sensors for monitoring and control of industrial processes. In Advances in Industrial Control. Springer London. https://doi.org/10.1007/978-1-84628-480-9

[15] Torgo, L. (2019). Regression data sets [Online dataset repository]. LIACC / University of Porto. https://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html

[16] Cortez, P., & Morais, A. (2007). Forest fires [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5D88D

[17] University of Toronto, DELVE Project. (2003). DELVE: Data for evaluating learning in valid experiments [Online dataset repository]. https://www.cs.toronto.edu/~delve/

[18] Zhou, F., Claire, Q., & King, R. D. (2014). Geographical origin of music [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5VK5D

[19] Necrothapa, S. (2020). Ames housing dataset [Dataset]. Kaggle. https://www.kaggle.com/datasets/shashanknecrothapa/ames-housing-dataset

[20] Schirmer, C. (2020). Boston housing [Dataset]. Kaggle. https://www.kaggle.com/datasets/schirmerchad/bostonhoustingmlnd

[21] Elkan, C. (2001). Magical thinking in data mining: Lessons from CoIL Challenge 2000. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 426–431). ACM. https://doi.org/10.1145/502512.502576

[22] Director's Challenge Consortium for the Molecular Classification of Lung Adenocarcinoma, Shedden, K., Taylor, J. M., Enkemann, S. A., Tsao, M. S., Yeatman, T. J., Gerald, W. L., Eschrich, S., Jurisica, I., Giordano, T. J., Misek, D. E., Chang, A. C., Zhu, C. Q., Strumpf, D., Hanash, S., & Shepherd, F. A. (2008). Shedden_2008: Gene expression–based survival prediction in lung adenocarcinoma [Dataset]. Lung Cancer Explorer, UT Southwestern. https://lce.biohpc.swmed.edu/lungcancer/datasetsearch.php?datasetid=1

[23] Branco, P., Torgo, L., & Ribeiro, R. P. (2025). Imbalanced regression data sets [Online dataset repository]. University of Porto. https://paobranco.github.io/Imbalanced-Regression-DataSets/

[24] Meta-data. (1994). Meta-data [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5X31P

[25] Board of Governors of the Federal Reserve System. (2025). H.15 selected interest rates [Statistical release and dataset repository]. https://www.federalreserve.gov/releases/h15/

[26] OpenML contributors. (2025). OpenML dataset 409 [Dataset]. OpenML platform. https://www.openml.org/d/409

[27] Coleman, D. (1986). pollen: Geometric features of pollen grains [Dataset]. StatLib Archive, Carnegie Mellon University. https://lib.stat.cmu.edu/data-expo/pollen.data

[28] Brien, C. J., & Payne, R. W. (1999). Tiers, structure formulae and the analysis of complicated experiments. Journal of the Royal Statistical Society: Series D (The Statistician), 48(1), 41–52.

[29] OpenML contributors. (n.d.). space_ga [Dataset]. OpenML. https://www.openml.org/d/507

[30] Carnegie Mellon University, Department of Statistics. (2016). StatLib: A data and software archive [Online dataset repository]. https://lib.stat.cmu.edu/datasets/

[31] Tibshirani, R. J. (2015). strike: Annual strikes data for OECD countries (1951–1985) [Dataset]. Dataset used in course “Statistical Computing,” Carnegie Mellon University. http://www.stat.cmu.edu/~ryantibs/statcomp-F15/homework/strike.txt

[32] OpenML contributors. (2025). sulfur [Dataset]. OpenML. https://www.openml.org/d/23515

[33] Hamidieh, K. (2018). Superconductivity data [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C53P47

[34] King, R. D., Hurst, J. D., & Sternberg, M. J. E. (1994). A comparison of artificial intelligence methods for modelling QSARs. Applied Artificial Intelligence, 9, 213–234. / Hirst, J. D., King, R. D., & Sternberg, M. J. E. (1994). Quantitative structure–activity relationships by neural networks and inductive logic programming: II. The inhibition of dihydrofolate reductase by triazines. Journal of Computer-Aided Molecular Design, 8(4), 421–432. https://doi.org/10.1007/BF00125376

[35] Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Wine Quality [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C56S3T.

[36] Gerritsma, J., Onnink, R., & Versluis, A. (1981). Yacht Hydrodynamics [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5XG7R.
